Hypercube LSH for Approximate near Neighbors
نویسنده
چکیده
A celebrated technique for finding near neighbors for the angular distance involves using a set of random hyperplanes to partition the space into hash regions [Charikar, STOC 2002]. Experiments later showed that using a set of orthogonal hyperplanes, thereby partitioning the space into the Voronoi regions induced by a hypercube, leads to even better results [Terasawa and Tanaka, WADS 2007]. However, no theoretical explanation for this improvement was ever given, and it remained unclear how the resulting hypercube hash method scales in high dimensions. In this work, we provide explicit asymptotics for the collision probabilities when using hypercubes to partition the space. For instance, two near-orthogonal vectors are expected to collide with probability ( 1 π ) in dimension d, compared to ( 1 2 ) when using random hyperplanes. Vectors at angle π 3 collide with probability ( √ 3 π ), compared to ( 2 3 ) for random hyperplanes, and near-parallel vectors collide with similar asymptotic probabilities in both cases. For c-approximate nearest neighbor searching, this translates to a decrease in the exponent ρ of localitysensitive hashing (LSH) methods of a factor up to log2(π) ≈ 1.652 compared to hyperplane LSH. For c = 2, we obtain ρ ≈ 0.302 + o(1) for hypercube LSH, improving upon the ρ ≈ 0.377 for hyperplane LSH. We further describe how to use hypercube LSH in practice, and we consider an example application in the area of lattice algorithms.
منابع مشابه
Multi-Level Spherical Locality Sensitive Hashing For Approximate Near Neighbors
This paper introduces “Multi-Level Spherical LSH”: parameter-free, a multi-level, data-dependant Locality Sensitive Hashing data structure for solving the Approximate Near Neighbors Problem (ANN). This data structure is a modified version multi-probe adaptive querying algorithm, with the potential of achieving a O(np + t) query run time, for all inputs n where t <= n. Keywords—Locality Sensitiv...
متن کاملPractical linear-space Approximate Near Neighbors in high dimension
The c-approximate Near Neighbor problem in high dimensional spaces has been mainly addressed by Locality Sensitive Hashing (LSH), which offers polynomial dependence on the dimension, query time sublinear in the size of the dataset, and subquadratic space requirement. For practical applications, linear space is typically imperative. Most previous work in the linear space regime focuses on the ca...
متن کاملA Refined Analysis of LSH for Well-dispersed Data Points
Near neighbor problems are fundamental in algorithms for high-dimensional Euclidean spaces. While classical approaches suffer from the curse of dimensionality, locality sensitive hashing (LSH) can effectively solve α-approximate r-near neighbor problem, and has been proven to be optimal in the worst case. However, for real-world data sets, LSH can naturally benefit from well-dispersed data and ...
متن کاملHybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space
We study the r-near neighbors reporting problem (rNNR) (or spherical range reporting), i.e., reporting all points in a high-dimensional point set S that lie within a radius r of a given query point. This problem has played building block roles in finding near-duplicate web pages, solving k-diverse near neighbor search and content-based image retrieval problems. Our approach builds upon the loca...
متن کاملSong Intersection by Approximate Nearest Neighbor Search
We present new methods for computing inter-song similarities using intersections between multiple audio pieces. The intersection contains portions that are similar, when one song is a derivative work of the other for example, in two different musical recordings. To scale our search to large song databases we have developed an algorithm based on localitysensitive hashing (LSH) of sequences of au...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017